CONVERGENCE OF NUMERICAL TIME- AVERAGING AND 
STATIONARY MEASURES VIA POISSON EQUATIONS 
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Abstract. Numerical approximation of the long time behavior of a stochastic differential equa- 
tion (SDE) is considered. Error estimates for time-averaging estimators are obtained and then used 
to show that the stationary behavior of the numerical method converges to that of the SDE. The 
error analysis is based on using an associated Poisson equation for the underlying SDE. The main 
advantage of this approach is its simplicity and universality. It works equally well for a range of 
explicit and implicit schemes including those with simple simulation of random variables, and for 
hypoelliptic SDEs. To simplify the exposition, we consider only the case where the state space of the 
SDE is a torus and we study only smooth test functions. However we anticipate that the approach 
can be applied more widely. An analogy between our approach and Stein's method is indicated. 
Some practical implications of the results are discussed. 
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1. Introduction. In many application one is interested in estimating the invari- 
ant measure of stochastic differential equation (SDE) by running a numerical scheme 
which approximates the time dynamics of the SDE. Two common approaches are to 
use the numerical trajectories to construct an empirical measure by the time averag- 
ing trajectories or by averaging many different realizations to obtain a finite ensemble 
average (see for example |30)). In either case one produces an approximation to the 
numerical method's invariant measure and is immediately presented with a number 
of questions including: (i) Does the numerical scheme have a stationary measure to 
which it converges quickly as time goes to infinity? (ii) How close is the numerical 
method's stationary measure to the stationary measure of the underlying SDE ? (iii) 
How close is the time-averaging estimator to the stationary measure of the underlying 
SDE? This article mainly focuses on the second two questions. 

The first question is addressed in many papers, including [3H|331[371[531[2n]- There 
are also a number of works where the second question has been considered. In |43) 
it was shown that several numerical methods for SDEs, including the forward Euler- 
Maruyama scheme, have unique stationary measures which converge at the expected 
rate to the unique stationary measure of the SDE. Moreover, in |45j an expansion of 
the numerical integration error in powers of time step was obtained which allows one 
to use the Richardson-Romberg extrapolation to improve the accuracy. These works 
discuss the convergence in the case of relatively smooth tests functions. In [51 [3], the 
authors use techniques from Malliavin calculus to establish smoothing properties of 
the discretization scheme. This allows one to prove the convergence of the averages of 
test functions which are only bounded. In turn, this shows that the distance between 
the numerical method's stationary measure is close to the stationary measure of the 
SDE in the total- variation norm. In [41] , a general method of deducing closeness of 
the stationary measure from an error estimate on a finite time interval is proposed, 
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though in general it does not provide optimal estimates of the convergence rate. In 
the earlier works a global Lipschitz assumption was imposed on the coefficients of the 
SDE which was hfted in gUl [HI SI [7] . The papers [SI US deal with elliptic SDEs, 
while the hypoelliptic case is treated in [2| [3| l44l [27l [26l [7] . 

In this paper we obtain error estimates for time-averaging estimators. The error 
estimates are given in terms of a term of the same order of magnitude as the weak finite 
time error of the numerical integrator used and the length of the simulation. Control of 
the time-averages is then used to prove closeness of the numerical method's stationary 
measure to the stationary measure of the SDE. The convergence rate obtained is the 
same as the weak convergence rate obtained on a finite time interval. This highlights 
the general principle that, for time-dependent problems, finite time approximation 
properties can be transferred to infinite time approximation properties in the presence 
of suitable stability. Here the underlying stability is the geometric ergodicity of the 
SDE. 

The main tool in our analysis is an associated Poisson equation for the underlying 
SDE. In this error analysis, the Markov chain generated by the numerical method need 
not be uniquely ergodic. It is shown that any stationary measure of the numerical 
method will be close to the unique stationary measure of the underlying SDE. We see 
the main advantage of the current analysis being its simplicity and universality. In 
particular, we obtain the error estimates in the case of numerical schemes with any 
reasonable simple random variables (including the discrete random variables used 
in weak Euler schemes [29]). Furthermore, our approach works equally well for a 
range of explicit and implicit schemes, and, perhaps more importantly, it works for 
hypoelliptic SDEs. It is comparatively short and the analysis leverages classical results 
on PDEs. It should be possible to carry over this approach to methods with adaptive 
step-sizes (see [HI [55]). It should also be adaptable to the SPDE setting, using the 
Poisson equation in infinite dimensions (see related applications of Poisson equations 
in 

Our goal here has not been to give the most general results. We have picked the 
simplifying setting of a compact phase space, namely the d-dimensional torus, and 
relatively smooth test functions. The extension to the whole space requires control of 
the time spent outside of the center of the phase space. Under additional assumptions 
in the spirit of [27], we believe that versions of the present results can be proven in 

In Section [21 we introduce the basic setting for the underlying SDE and discuss 
the hypoellipticity assumptions. In Section [S] we describe the class of numerical 
approximations we consider. We give examples of explicit and implicit methods which 
fit our framework. In Section 14.11 we discuss the auxiliary Poisson equation which 
will be used to prove the main results, and we give properties of its solution. In 
Section |4.2| we warm up by showing how to prove a law of large numbers, for the 
SDE, using an auxiliary Poisson equation. Section rS.ll contains the main results of the 
article which give a number of senses in which the numerical time- aver aging estimators 
are close to the corresponding stationary time average of the SDE. In Section 15.21 
we extend the results of Section 15.11 to numerical methods of higher orders and in 
Section 15.31 the Richardson-Romberg (Talay-Tubaro) error expansion is considered. 
Section [6] uses the results of Section [5] to give a quantitative estimate on the distance 
of the numerical stationary measures from the underlying stationary measure for the 
SDE. In Section 16.21 we make some general comments about analogies with Stein's 
method for proving the convergence of distributions to a limiting law. Some practical 
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implications of the results of Section [S] are discussed in Section [71 where we classify 
the errors of the numerical time- averaging estimators and, in particular, pay attention 
to the statistical error. 

2. SDE Setting. Let (il, J^, J^t, P), i > 0, be a filtered probability space and 
W{t) = {Wi{t), . . . , Wm{t))^ be an m-dimensional {J^t}t>o-adapted standard Wiener 
process. Consider the Markov process X{t) on the torus^T'^ whose time evolution is 
governed by the Ito SDE 

dX{t) = f{X{t)) dt + g{X{t)) dW{t) , (2.1) 

where / = (/i, . . . , f^Y : T'' ^ R"* and g: T'^ ^ R'*><™. We assume that / and g are 
Lipschitz continuous (and hence uniformly bounded) functions on T"*. Under these 
assumptions, equation (|2.ip has a unique pathwise global solution. The generator of 
the Markov process X{t) is 

£=7-V + ia:VV (2.2) 

d 



dxi 2 dxidx 



where a{x) — g{x)g^ {x). For a twice differentiable function (p: T'' — > R and x E T'^ 
we have 

(£0)(x) = /(x).V0(x) + ia(x) : VV0(a;) 



2 dxidxj 

The operator £ generates a strong Markov semigroup Pt, for t > 0, which maps 
smooth bounded functions to smooth bounded functions. It can also be defined by 
{Pt<t>){x) = 'Ejx4i{Xt) where Xg = x. By duality, Pt also induces a semigroup which 
acts on probability measures. In the name of notational economy, we will also denote 
this semigroup by Pt- 

The fc-th column of g, which we denote by g''^\ can be viewed as a function from 
T'^ BJ^. Hence . . . , ff^")} is a collection of vector fields on T'' and (PTjl 

can be rewritten as 

m 

dx{t) = f{x{t)) dt + Y. ff^'H^W) dWk{t). 

k=l 

This form of equation (12. ip makes it clear that there is a deterministic drift in the 
direction of / and independent random kicks in each of m directions {g^^\ . . . , g'™-'}. 
We will require that the randomness injected into the dynamics in the g'^^^ direc- 
tions effects all directions sufficiently to produce a "smoothing" effect at the level of 
probability densities. Uniform ellipticity is the simplest assumption which ensures 



^In physical applications a process on the torus can be of interest when periodic boundary 
conditions imposed on the problem. A typical example is a noisy gradient system which may be used 
to sample from the Gibbs' distribution with a periodic potential [38] . We also note that our results 
can potentially be applied to approximating Lyapunov exponents on compact smooth manifolds I14| . 
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"sufficient spreading" of the randomness. However, it is far from necessary and many 
important examples are not uniformly elliptic. For our purposes, it is sufficient that 
the system is hypoelliptic. 

The conditions given in our standing Assumption [U given below, are enough to 
ensure hypoellipticity. To state them, we need to recall the definition of the Lie- 
bracket (or commutator) of two vector fields. In our setting, given two vector fields 
h, h on T'' with h = {hi, . . . , hd)^ and h = {hi, . . . , hd)^ one defines the Lie-bracket 
as [h,h]{x) = {h.Vh){x) — (h.Vh){x). Hence the jth component of [h,h]{x) is given 
by X^iLi ^ ■ The vector [h,h]{x) may be thought of as the 

new direction generated by infinitesimally following h, then h, then —h and finally —h. 
Our most general assumption will be that the collection of all the brackets generated 
by the randomness spans the tangent space at all points. To track how the noise 
spreads, we introduce the following increasing set of vector fields 

Ao = span{/,g(i), . . . ,5^™^}, A„+i = span{/i, [h, h] : h £ A„, h £ A^} . 

The following is our basic assumption governing the stationary measure of the 
system, and its smoothness. 

Assumption 1. We assume that one of the following two assumptions hold: 
i) (Elliptic Setting) The matrix-valued function a{x) = g{x)g^{x) is uniformly 
positive definite: there exists a > so that, for all z £ IV^ and x £ , 
a|zp < a{x)z.z . 

a) (Hypoelliptic setting) The functions f and g are C°°(T'',R'^) and such that, 
for some n and all x £ , A„ — A„(x) — R'^ and (12.11) possesses a unique 
stationary measure. 

In either case in Assumption [TJ we have that (j2.ip has a unique stationary mea- 
sure, henceforth denoted by /i, which has a density with respect to Lebesgue measure 
on T"^. 

As already mentioned, the goal of this paper is to give a simple, yet robust, proof 
that time averages obtained using a class of numerical methods for simulating p.ip 
is close to the corresponding ergodic limits of (|2.ip . To this end, we now describe the 
class of numerical methods we will consider. 

3. Numerical approximations. We will begin by considering a class of simple 
numerical approximation of (j2.ip given by the generalized Euler-Maruyama method 
on the torus T'' : 



where A is the time increment, F: T'' x (0, 1) R'^ , G : T'^ x (0, 1) R''^", and 

Vn = {v 71,1-1 ■•■ 1 Vn^m 

y is an R™-valued random variable and {rin,i : n £ N,i £ 
{1, . . . , m}} is a collection of i.i.d. real-valued random variables satisfying 



■^We do not specify r in the statements of the forthcoming theorems since common numerical 
schemes use random variables with bounded moments up to any order. At the same time, in each 
proof of Section |5. II it is not difficult to recover the number of bounded moments required. 




(3.1) 



for a sufficiently large r > 20 More general methods will be considered in Section [521 
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For example, 771^1 ~ A/'(0, 1) satisfies these assumptions as well as iji i with the 

law 



P(7;i,i = ±1) = 1/2 . (3.2) 

We also note that the above two choices of 771,1 guarantee existence of finite moments 
of 771^1 of any order. Further, we assume that 77,1, n € N, is defined on the probability 
space (fi, J^, P) and is {J^t,,}-adapted, for i„ = nA. 

Since we want the dynamics of (|3.ip to be "close" to those of (|2.1I) . we make the 
following assumption on the "local error" of p.ip . 

Assumption 2. For some C > 0, all x e T^, and all A sufficiently small 

\F{x, A) - f{x)\ + \G{x, A) - g{x)\ < CA . (3.3) 

Under this assumptions we expect that X„ « X(tn) where as before i„ = tiA. In 
fact it follows from the general theory [29] that such numerical schemes have first-order 
weak convergence on finite time intervals. 

In analogy to (|2.2p , we define an operator associated to (|3.ip by 

£^(.)=F(.,A).V + iA(.,A):VV , 

where A{x, A) = G{x, A)G^ {x, A). While not the generator of the Markov process 
p.ip . it is the generator's leading order part since 

e(0(Xi)-(/)(j;o)) = (£^(/))(xo)A + 0(A2) as A^O. (3.4) 

Since (by Assumption [2]) we have that 

\C^(f>-C<f>\oo<CA\D^(f>\^ , (3.5) 

where D'^<j) is the second derivative, we deduce that 

e(^0(Ai) -(/)(xo)) (£(/))(xo)A + 0(A2) as A^O. (3.6) 

It is reasonable to expect that the distribution of the dynamics of the SDE and its 
approximation will be close to each other since the leading part of the generator of 
the approximation (|3.1I) is close to that of the original process (|2.ip . 

We close this section by giving two approximation methods which satisfy Assump- 
tion [2 

Example 3.1 (Exphcit Euler-Maruyama) . We define 

A„+i = A„ + /(X„)A + .g(X„)r;„+i VA (3.7) 

and hence F[x, A) — f{x) and G{x, A) = g{x). 
Example 3.2 (Imphcit split-step). We define 

^n+l = + fiXn+l)A 

Xn+1 = a;+i +5(A,*+i)77„+iv/A 
5 



and hence F{x, A) — f{y) and G{x, A) = g(y) where y is and element of {y £ : 
y ^ X + f{y)A} which is closest to x. 

Remark 3.3. From p.6p and the backwards Kolmogorov equation for (|2.1I) . 
Assumption\^is equivalent to 

E</.(Xi) - E</.(X(A)) ^ 0(A2) (3.8) 

for all sufficiently smooth 0, provided that Xq = X{Q). In the language of consis- 
tency and stability of a numerical method used in the introduction, p.6p and p.Sp 
provide an appealing way to characterize the degree of local consistency of a numerical 
method. This is will be the starting point for our discussion of higher order methods 
in Section \5.2[ 

4. Poisson equation. 

4.1. Background. It is a "meta-theorem" in averaging, homogenization and 
ergodic theory that if one can solve the relevant Poisson equation then one can prove 
results about the desired limit of a time-average. A central theme of this paper is 
to show how an appropriate Poisson equation can be used to analyze the long time 
average of a given function cj) : T"^ — R, evaluated along a numerical approximation of 
(j2.1|) and obtain information about its closeness to the corresponding ergodic limits. 

In this section, for motivation, we illustrate key ideas related to the Poisson 
equation, purely in the continuous time setting. To this end, recalling that /i is the 
unique stationary measure of (|2.ip and £ its generator, given (f>: T'' — > R wc define 
the stationary average of (f> by 

= / (t>{z)^l{dz) (4.1) 

and let ip solve the Poisson equation 

C^p^(j)-4>. (4.2) 

Under Assumption [U (|4.2p possesses a unique solution which is at least as smooth as 
ip. For fc e N, we denote by W^'°° the space of functions from T'^ to R such that the 
function and all of its partial derivatives up to order k are essentially bounded. We 
then have the following result whose proof we sketch. 

Theorem 4.1. Under AssumptionUS^, given any (j) e W^'°° , with fc e N U {0} 
there exists a unique solution G ^^+2,00 (|4.2p . Under Assumvtion [71fzl)j there 
exists a S > so that given any (j) G W'''°° , with fc G N U {n > 2}, there exists a 
unique solution ip e w''^^'"" to (|4.2p . 

Proof. Let A denote the Laplacian on the d dimensional torus. In either the 
hypoelliptic or elliptic case, one knows that for some positive a and C 

||(-a)>|U2 <c(||/:<^IU. + 

for all smooth </>. (See [T5] and recall that our space is compact.) This implies that C 
has compact resolvent and discrete spectrum (consisting of isolated points) . Thus C 
has a spectral gap and is invertible on the complement of the span of the eigenfunctions 
with zero eigenvalue. The space is nothing other than the functions which have zero 
mean with respect to an invariant measure for the corresponding semigroup Pt (i.e. 
/i such that Ptfi = /i or in terms of the generator C* fi = 0). Since the system has a 
unique invariant measure due to Assumption [U we see that 4> — (f) has zero projection 
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onto the space spanned by eigenf unctions with eigenvalue zero. Hence we know there 
exists a function u which solves (j4.2p weakly, by the Fredholm alternative. 

This leaves the regularity. In the uniformly elliptic case, see [171 [21] |42] . In the 
hypoelliptic case the results follow from Theorem 7.3.4 of [32] or [T^. This states 
that there is (5 > such that, if (/) G W'^^p, then u € W''+^'P for p £ [2, oo) and fc G N. 
Since our (j) e W'^'P for all p > 2, we know that u € np>i W'^'^^'P. Since our space 
has finite measure, we know that = f]^^^ w''+^'P. 

□ 

Remark 4.2. One of the principle technical issues that must he addressed in 
order to extend the results in this paper from the case to general ergodic SDE on 
R'' is proof of the existence of nice, well- controlled solutions to the above Poisson 
equation. Many of the needed results can be found in \35\ \ 3(i\ |37|/. 



4.2. A strong law of large numbers: an illustrative example. We now 

show how to use the Poisson equation to prove a law of large numbers for (|2.ip . This 
idea appears frequently in the literature [6l [131 [331 [311 [3H]- Nonetheless since this 
technique is central to our investigation of discrete time approximations to diffusions, 
we first highlight the main ideas in continuous time. However, before we begin it is 
worth noting that, at least formally, the solution to Poisson equation can be written 
as 

/•oo 

^{x)=- Ps{^ ~ 4>){x)ds . (4.3) 
Jo 

This can be interpreted as the total fluctuation over time of Ptcj) from (j)] and hence, 
it is not surprising that '0 can be used to control convergence to equilibrium. 

As stated in Theorem 14.11 when Assumption [T] holds, given any 0: T'* R 
with (/) G W'^'°° there exists a unique ^p: T"^ — > R which solves (|4.2p . Furthermore 
ip G W^'°° and hence Ito's formula then tells us that 

i:{X{t)) - xPix^) = f (0(A(s)) ~4>)ds+ [\vib){X{s)) . g{X{s))dW{s) . (4.4) 
Jo Jo 

Rearranging this, we obtain that 

l/V,x,.))*-^=*<iW)_*M_iM(<,. ,«) 

where M{t) = J*{Vi)){X{s)).g{X{s)) dW{s). Since ip is bounded, the first term on 
the right-hand side goes to zero as t — oo. To see that the last term also goes to zero 
as < — > cx) observe that 

lE(M(t)2) = lE(A/)(t)<^ 

for some K > independent of time since V'0 and g are both bounded on T''. More 
precisely, we have shown that for any initial xq 



E 



(- <PiXis))ds ~ < - , (4.6) 



which is a quantitative version of what is often called the mean ergodic theorem. 
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We can view i /J" (/)(X(s)) ds as an estimator for (j) and it follows from (|4.5I) that 



(4.7) 



One can also show that 



Var - / 0(X(s))ds =01 -) . (4.8) 



The estimate (|4.8p easily follows from (j4.6p and (|4.7p but can also be obtained directly 
from ()4.5p using an additional mixing condition. 

To obtain an almost sure (a.s.) resullH, for any e > we introduce A(r;£) = 
{^sup«y |M(t)| > T'^~2}. By the Doob inequality for continuous martingales and 
the factlhat E|M(r)|2 < KT , we get 

Hence 

^ P(A(2";£)) < oo 

riGN 

and the Borel-Cantelli lemma implies that there exists an a.s. bounded random 
variable C{uj) > and hq so that for every n e N with n > uq one has 

1 C(lu) 
— sup M(t < , . 

Hence with probability one, for every t S [2", 2"+^) with n > tiq one has 

1,, r. M 1 M 2 2C(a;) 2C(w) 

- A/ i) <— sup Af s < -— - sup Af(5) < , < -—pT 



By combining this estimate with (|4.5p and the fact that ip is bounded, we see that for 
any e > and for every t > 2"° one has 



1 



T 



T 







-I ^{X{s))ds~<p <^^ + j;^^ a.s. (4.9) 



for some a.s. bounded C(a;) > 0. We note that (14.91) implies that for any initial xq 

1 /•* 

lim - / (t>{X{s))ds^(t> a.s.. (4.10) 

i^OO t Jq 

In other words, the strong law of large number holds starting from any initial xq. 
Our assumptions are sufficient to ensure that has a unique stationary measure 
fx. Hence, it follows from BirkofF's ergodic theorem that ()4.10p holds for /i-a.e. initial 
xq. The above argument not only shows that the result holds for all xq, but also gives 
quantitative estimates on the rate of convergence. 



•^This result can also be obtained using Kronecker's lemma (see, e.g. [12]). Though the approach 
we follow gives explicit error estimates which can be useful. 



5. Error analysis for the numerical time-average. 

5.1. First order accurate schemes. We now wish to generalize the calcula- 
tions in the previous section to prove the closeness of the time averages obtained via 
the Euler approximation (|3.1[) to the stationary averages of (12.11) . We consider sam- 
ple path estimates in this section and then convert these to distance estimates in an 
appropriate metric, in Section [6l 

Let T = NA. Introduce the estimator (discrete time-average) (j)N for the station- 
ary average 0: 

N-l 

= N '^(^») ' (5-1) 

n=0 

where X„ is the SDE approximation defined in p.ip . 

We start by proving a mean convergence result, followed by and almost sure 
theorems. 

Theorem 5.1. Let Assumptions[l\and[Mhold. Let H denote W^'°° in the elliptic 
setting and in the hypoelliptic setting. Then for any 4> & H, one has 



N 



<C(A + i), (5.2) 



where T = NA and C is some positive constant independent of A and T. Further- 
more, the constant C is a linear function of \(f)\H and otherwise independent 0/00 

Proof. Under either of the conditions from Assumption [TJ we know from Theo- 
rem l4.1l that the solution ip of (|4.2[) is in W^'°° . In the interest of clarity and definite- 
ness, we will proceed under the first condition in Assumption [1] (the elliptic setting) 
where the jD^loo < C|L»2^|oo, \D'^i>U < C\D^\^, |/?V|ooV|i??/'|ooV|V'|oo < C\^\^. 
In the hypoelliptic setting, the fc*'' derivative of ip is bounded by the same derivative 
of 4>, necessitating the change in definition of H between the elliptic and hypoelliptic 
cases in the statement of the theorem. 

For brevity, we write (/)„ = 0(A:„), F„ = F{Xn,A), Gn = G(X„,A), tpn = 
tpiXn) and D^^n = {D^^){Xn) where {D^il)){z) is the kth derivative. We write 
{D^il}){z)[hi, . . . , hk] for the derivative evaluated in the directions hj. Defining 5„ =^ 
Xn+i - Xn = AFn + \/AG„?7„+i, wc havc 

lAn+l = l/'n +£'V'n[^«] + ^£'Vn[4,4] + ^ Z?^ V^n , ^„ , (S„] + i?„+l , (5.3) 

where 

Rn+l ^ y S^D'^i^isXn + (1 - s)A„+i) ds^ [5n,5n, ^n, ^n] 

is the remainder given by the Taylor theorem. Hence, 

i^n+l = + AC^-iIj,, + A^ Di^n[Gnrin+l\ + Ai D'^lJ;n[Fn,GnTln+l\ (5.4) 



''We indicate the dependence of constants on </> since this dependence is used in Section |6] 
Obviously, all the constants appearing in the statements of this and forthcoming theorems also 
depend on the coefficients of the SDE 1 12.111 and on the numerical method used. 
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r„+i = ^fL'^'0„[G„r/„+i,G'„77„+i] - A{x, A) :W-ip., 



where 

2 

Summing (|5.4p over the first iV terms, dividing by A^A and using (|4.2p . produces 

1 iV-l _ w-i 3 

^(V-AT-Vo) = - ^(0„~^) + - ^(/:^-£)^„ + ^5](M,^ + 5,,^), (5.5) 

Ti— n— 2—1 

where 

AT-l Af-1 JV-1 

n— n— n=0 

A^ ^"^ ^"^ 1 ^"^ 

n— n=0 n— 

We will find it convenient to further decompose 
where 

N-1 

n 

77„+l, G„ 77,1+1, G„77„+i] + SAZJ^Vnl^'n, -Fn, G 

n=0 

Notice that E[r„+i| J"t„] = and E[?7„+i| J"t„] = and E[?7„+i,i r]n+ij ?7„+i,fc| J"t„] = 0. 
Then it is not difficult to see that Mi^k, i = 0, . . . , 3, are martingales with respect to 
{J'tk} and, in particular, EM^^fc — for any k. 

Since / and g are uniformly bounded, p.3p implies that Fn and G„ are uniformly 
bounded in n. Recall that we also know that tp and its first four derivatives are 
uniformly bounded. Hence the Si^N are bounded as follows: 

N-1 N-1 

\Si,n\ < X CMoo = Gi|,^UAT, E\S2,n\ < ^ E|i?„+i| < C2\D^cl)\^AT, 

n=0 71=0 

(5.6) 

N-1 

nSo,N\ < X Co\D^\oo = Co\D^\ooAT, 

n=0 

where the Ci are positive constants which we have labeled for future reference. Simi- 
larly, we have (cf. p. 5^ *1: 

N-1 N-1 

I X(£^-/:)V„| < J2 CMooA = C^l^l^AN . 

n=0 n=Q 

Notice that this bound and the above bound in Si,n are a.s. bounds with deterministic 
constants. Lastly observe that \tpN — 4'o\ < 2|(/)|oo. Applying all of the preceding 
estimates to (|5.5I) produces the quoted result. □ 
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Theorem [Q implies that (of. (|i?7|) ): 

Bias(0w) -0(a + ^). (5.7) 

We now consider an convergence result, related to the mean ergodic theorem 
(|4.6|) in the continuous case. 

Theorem 5.2. In the setting of Theorem \5.1\ for any (j) G H, one has 

e(0w-0)' <c(A2 + i), (5.8) 

where T = TV A and C is some positive constant independent of A and T . Further- 
more, C depends on 4> only through \(j)\H and does so linearly. 

Proof. As in the proof of Theorem 15.11 we provide details in the elliptic case; 
in the hypoelliptic case the only change is that higher derivatives of (j) appear in the 
constants. We begin with (|5.5p from the proof of Theorem 15. II and obtain 

"f^- ' ^ Ce{ <*iL_J«! + - (5.!,) 

n=0 ^ n=Q 

where C is an ever changing constant. Observe that it follows from (15.61) that |S'i,Arp < 
C^I^I^A^T^. Further, by reasoning similar to that used in getting (|5.6p we have 

N 

ESIj, < mn\\Rk\ < C\D'^\lA'T^ , 

n,k—l 

N-1 

k,n=Q 

Since Mi^jv is a martingale, we get 

n=0 n=0 

Similar reasoning and the bound on ETyf^ give 

EMi^ < A E mi^c\cp\iT, EM|_^ < A^ E mi^mi^^T, 

n=0 n=0 

N-1 

EM2^ < A3 E C\D4,\l=C\D^\lA'T . 

Tl = 

Using the bounds on | J2n=Q ~ ^'^"^ IV'JV — V'ol from the proof of Theorem l5.ll 

and all the above inequalities, we estimate the right-hand side of (j5.9p and arrive at 
the quoted result. □ 
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We now prove an almost sure version of the preceding two results. Its relation to 
Theorem 15.21 is the same as the relation of (14. 9p to (|4.6p . 

Theorem 5.3. In the setting of Theorem \ 5.1l fixing an L > there exists a 
deterministic constant K (depending lineally on L) so that for all (j) with \\4>\\h < L, 
A sufficiently small, positive e > 0, and T sufficiently large one has: 



t>N - 4> 



<i^A + ^^ a.s., (5.10) 



where T = NA and C'{ll)) > is an a.s. bounded random variable depending on e and 
the particular (j). 

Proof. As in the proof of Theorem 15.11 we provide details in the elliptic case; 
in the hypoelliptic case the only change is that higher derivatives of (p appear in the 
constants. Starting from (j5.5p . we have 

JV-l I , I N-l 2 3 

Recall that (see the proof of Theorem 15. ip 

N-l 

I^^AT-V'ol < 2|(/.|oo, E K'^'^-'^)V^»I ^^4|</'|oo AiV, |5l,Ar| <Ci|0U AT, 
ji=0 

N-l N-l 

|52,iv| < i^A^ ^ |77„+i|^ + i^A47V, \SoM<KA''Y,\T^r.+i\' + KA^N, 

where K is an ever changing positive deterministic constant, independent of A and 
N. Due to the strong law of large numbers, the sum 'Ylin=o kn+i|^ ^-S- converge^ 

|4 



to E |?7i| and thus for almost every sequence rji, r]2, ... and all sufficiently large N 
we have J2n=o l'7n+il'' < ^or some deterministic K > 0. Hence 

n=0 

where Qn = y X^Lo I^^^a^I- 

Now we analyze 8 at . We have for r > 1 

<^EE|M,,^r 

(here K depends on r). Recall that M^ fe are martingales. If we can prove that 
EjMi^Arp'" < CN^ for all r sufficiently large then the proof can be completed with the 
aid of the Borel-Cantelli lemma as in Section 14.21 

We will the provide argument for estimating E|M2,Arp'', the other terms are esti- 
mated analogously. We re-write 



N-l 

M2.N - VA E D^n[GnVn+l] - %/AM2,W 



^Note that in the case of i from 1 13.21 1 this sum is equal to m^. 
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We have M2,i — and for an integer r > 



(5.12) 



2r 



1=2 



Note that the second term is equal to zero. Indeed 

E (^M^^-^DMGkVk+i]) = E (Mffe-^E {D^GkVk+i] |-^tj) = . (5.13) 



Using the elementary inequality 



b>0, > 1, i + 1 = 1, 

p q 



we get 

E{\M2,k\''^-'\DMGkVk+i]\') < ^ 



1 f2r-l. ~ 



E 



2r l^^.feP'^ + -\DMGkrik+iTN^''"j , 

(5.14) 



Z = 2,...,2r 
The relations ([gl^ - (|ETi)) imply 



if. 



EMffc+i < EMfJl + -) + i^TV 



r — 1 



whence 



Note that by Jensen's inequality this inequality holds for non-integer r > 1 as well. 
Therefore, 



1 |2r A'' 

^E|M2,Ar| = ^E 



2,7V 



2r K 
< 



Analogously, we obtain 



±-E\M,,^r<A^^, _l_E|Af3.^|2'-<A2^^, _Le|Mo,^|^^<A^^^. 



Thus, 



Ee?; < 



K 



(5.15) 



The Markov inequality together with (|5.15p implies 



\ — < A2''TiV2''T(Ee^) < KT^'^^^-^l 
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Then for any 7 — 1/2 — £, with e > 0, there is a sufficiently large r > 1 such that 
(recall that A is fixed here and T = AA^) 



CO / \ 00 

Hence, by the Borel-Cantelli lemma, the random variable <r supj^^Q T'^'Gjv is a.s. 
finite which together with (|5.1ip implies (|5.10p . □ 

We note that Theorems 15 . 1115 .31 do not require the Markov chain X„ to be ergodic. 
It is, of course, possible under some additional conditions on the numerical method 
to prove its ergodicity (see for example [331 1131 HZj). If the limit limjv_j.oo (f^N exists 
and is independent of Xq (i.e., if the Markov chain X„ is ergodic) then it follows from 
Theorem 15.31 that limAr_j.oo \4'n — ^| ^ ^A a.s., which is consistent with the results 
of, for example, [33]. 

Remark 5.4. In the series of papers [121 [M [211 [25!/ the authors consider 

the following weighted estimator for the stationary average 4> ■ 

7 E^=l«^«0(^n) 

"PN = — =^jv ' 

where Xn are obtained by an Euler-type scheme with a decreasing time step A„ such 
that An — > as 71 —> 00 while positive weights Wn are such that ^^=1 Wn ^ 00 as 
N OD. In particular, they proved that for soTue EulcT-type schcTiics and Wn — — 
Aon-", as [1/3,1), 



Bias (^(t>N^ - 
Var (^4>N^ 



0, a e (1/3,1) 
0{1/N^^^), a = 1/3, 

1 

ATI-" 



i.e., roughly speaking, the error of the estimator (j)^ under the optimal choice of the 
parameters is 



ON 



o 



1 



iVl/3 



We also note that the authors of \2S[ proved a.s. convergence of weighted empirical 
measures based on Euler-type schemes with decreasing step to the invariant measure 
of the corresponding SDE exploiting the Echeverria- Weiss theorem, which differs from 
the approach used in this paper. 

Let us briefly compare the estimators (pN and For the estimator (p^ from 

i5.1]) with equal weights Wn = 1 and with Xn obtained by an Euler-type scheme with 
constant time step A, we can say that the error is (cf. Theorems \5.S\ and \5.3\) : 



If we fix the computational costs (i.e., N and an Euler-type scheme) then the asymp- 
totically optimal choice of A for (f>N is N~^^^ in which case O (A + l/N^^^A^^'^'^ = 
O (l/iV^/'^) . It is interesting to note that these optimal orders of errors of both esti- 
mators are the same. At the same time, we should emphasize that the term with A 
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in 115.16]) (see also KA |5. 7| j and i5.1U\) ) is related to the numerical integration error 
while the term with l/^/T = l/N^^'^A^^^ is related to the statistical error (see also 
Section^. In the case of large scale simulations (like those in molecular dynamics) 
the numerical error is usually relatively small thanks to the existing state of the art 
numerical integrators and the statistical error prevails. In such common in practice 
situations one chooses A and N to appropriately control the corresponding errors ( see 
further discussion in Section^. 

5.2. Higher order schemes. Wc now consider a more general approximation 
than l|3.ip . In addition to the assumptions made in Section [5] we assume in this section 
that the coefficients of the SDE (|2.1I) and the function (j) are sufficiently smooth. 

Given a function S : T*^ x (0, 1) x R™ -> T'* and a sequence of R™-valued i.i.d. 
random variables — {£,n,i, ■ ■ ■ ,£,n,m) ■ n G N}, we define a general numerical 
method 

{Xn+l = Xn + S(Xn, A,^n+l) , /r i 

Xq=X. 

We assume that the ^„ have sufficiently high moments finite. Clearly our previous 
class of methods fits in to this framework as well as a number of new methods such 
as implicit Euler. Guided by p.8|) we make the following general assumption about 
()5.17|) after which we will state some easier to verify conditions which are equivalent. 

Assumption 3. For all A e (0,1) sufficiently small, and all (f> e p^2(p+i),oo 
Xa = X{0) then 

E <f>{Xi) - E <^(X(A)) = 0{AP+^), (5.18) 

where the constant in the error term is uniform over all (p with ||(^|j^y2(p+i).oo < 1. 

To complement the increments of the numerical method 5 defined above we now 
define the 5 increments of the SDE. Namely for any x € T'', we define (5 : x (0, 1) x 
17 ^ T"* by 

(5(x, A;a;) = X(A;w) - (5.19) 

where X{Q;uj) — x and X{t;Lu) solves (|2.1|) . The following proposition, whose proof 
is given at the end of the section, enables Assumption [3] to be verified. 

Proposition 5.5. Assume that for some p G N, there exists a positive constant 
K so that for all A G (0, 1) sufficiently small: 

s s 

sup |e([]5,, <ifAP+i, s = l,...,2p+l, (5.20) 

{ai,...,as) 



.e{i,...,d} 



i=i 



2p+2 

sup E Jl < i^AP+1 . (5.21) 

(Ql,...,a2p + 2) 

aie{l,...,(i} 

Then the method satisfies Assumption\^with the same p. 

We note that examples of second-order weak schemes [p — 2) which satisfy As- 
sumption [3] can be found in many places including [521 P- 103] and [T]. Higher order 
methods also exist . We also note that it follows from the general theory p. 
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100] that the numerical schemes considered in Proposition [53] have weak convergence 
of order p on finite time intervals. Now we prove a mean convergence result, which is 
analogous to Theorem IS.ll in the case of Euler-type methods 

Theorem 5.6. Let Assumptions\^and\^hold. Let H denote W^P'°° in the elliptic 
setting and in the hypoelliptic setting. Then for any (f) £ H with \(t>\H < Ij 

consider (j)N defined by (15. ip where Xn is generated by the numerical method from 
(|5.17p rather than (|3.ip as previously. Then 



'E(f>N - <p 



< C 



(a'' + ^), (5.22) 



where T = NA and C is some positive constant independent of A and T . Further- 
more, the constant C is a linear function of \<j)\H and otherwise independent of (p. Ln 
other words, 

Bias(0jv) = o(A?' + l) . (5.23) 

Proof, (of Theorem I5.6|) As before, let -if) be the solution to the Poisson equation 
associated to 4> given in (|4.2p . Define ■0„ = '0(-^n) and let X{x^t) be the solution to 
(|2.ip at time t with initial condition x. From our assumptions, we have that 

= E7/^(X(X„, A)) + 0{/XP+^). (5.24) 

Rewriting E7/;(X(X„, A)) in (|5.24p via the Taylor expansion of expectations of SDE 
solution ('29', Lemma 2.1.9, p. 99] or [19]), we arrive at 

P Afc 

EV'n+i = E7^„ + ^ — E (/:V^) {Xn) + 0(Af+i). (5.25) 

fc=i 

Summing (|5.25p over the first N terms and dividing by A^A, we obtain 



1 ^^-1 P Afc-l 

T -^EE('C^)(^n) + ETQ^- + 0(^')' (5.26) 

ri=0 fc=2 

where Qk — ^ J2n=a ^ {^^^) (A,i). Using (I4.2p and boundedness of ■(/', we have after 
rearrangement of (|5.26p : 

|-EE0(A„)-0| <^^|Q,|+ifAP + -. (5.27) 

n=0 k=2 

Applying analogous arguments to C'^'^^jj, fc = 2, . . . ,p, as we did for tp in (15.261) . we 
have 



'—k 



Therefore (cf. (I071) '). 



|Q.|< E (-^^IO.|+i^A^+^-'= + ^, fc = 2. 
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and, in particular, \Qp\ < KA + K/T. Hence \Qk\ < KAp+^-^ + K/T which together 
with ^^jm imphes (jf^ FI □ 

Remark 5.7. If we substitute Xn from the method (|5.17l) in from (j5.1l) . it is 
also possible to prove ( analogous to Theorem 1 5. ^|) t/iat 



<t>N - (t> 



< KAP 



yl/2-e 



(5.28) 



Proof, (of Proposition l5.5l) We note that under the assumptions made the solution 
ij) of the Poisson equation (|4.2[) is sufficiently smooth. Expanding ipn+i = i^iXn+i) 
in powers of 5„ X^+i — X„ and taking expectation, we get 



■Ei?2p+iv,„[4,...,4] + 0(AP+i), 



(5.29) 



(2p+l) 



2p+l 



where the remainder |0(A''"'"^)| < KA'P'^'^ with independent of A and n. If we 
replace 5n by (5„ defined in (I5.19[) in (j5.29p then, using the conditional version of 
()5.20p . we obtain (with a different remainder 0{Ap^'^) than in (|5.29p ): 



+ 1 



+ 



.EZ?2p+i^„[^„,...,5„] + 0(AP+i). 



(5.30) 



(2p+l) 



2p+l 



It is not difficult to see that the right-hand side of (|5.30p coincide with expectation 
of the Taylor expansion of il){X{Xm A)) around Xn up to a reminder of order Ap+^. 
Hence 

E^„+i = E^((X(X„, A)) + 0(Af+i) 
and the proof is complete. □ 

5.3. The Richardson-Romberg (Talay-Tubaro) error expansion. In this 
section we consider an expansion of the global error E(/)Ar — in powers of the time step 
A analogous to the Talay-Tubaro result 53- As in the previous section, we assume 
here that the coefficients of the SDE p. II) and the function (f) are sufficiently smooth. 
Note that we obtain the expansion both in the elliptic and hypoelliptic setting. 

Theorem 5.8. Let Assumptions^^ and\^ hold. Let q be a positive integer and H 
denote Vr2(p-i-g-t-i),oo elliptic setting and VF^'^p+'^+^^'°° in the hypoelliptic setting. 

For any (j) & H with \4>\h < 1; consider (pN defined by (j5.ip where Xn is generated by 
the numerical method from (|5.17p . Then 



E0Ar 
where T — NA\ 



O AP+«+i + - 



CgAP+'^ + O AP+'+i 



(5.31) 



< K I Af+«+i + - 



^To help with intuitive understanding of the proof, we remark that J C^ip{x) dii(x) = which is 
approximated by with sufficient accuracy. 
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and the constants Cq, . . . ,Cq, K are independent of A and T and they are linear 
function of \4>\h ^.i^d otherwise independent of (p. In other words, 



Bias(0w) = C^AP + ■■■ + CgAP+« + O j^A^+^+i + i 

Proof. Here we make use of the Poisson equation again and also exploit an idea 
used in the proof of the Talay-Tubaro expansion in the finite time case from [291 PP- 
106-108]. We prove the theorem in the case of p = 1 (i.e., for Euler-type schemes) 
and g = for the clarity of the exposition. It is not difficult to extend the proof to 
arbitrary p,q > 0. 

In the case of p = 1 for a particular Euler-type scheme, we can write (cf. (j5.25l) ): 

EV'n+i = EV'„ + AE (£V) (Xn) + A2eA(X„)+0(A3) , (5.32) 

where A{x) is the coefficient at A^ in the corresponding expansion of Eipi at Xq = x. 
For instance, in the case of the explicit Euler-Maruyama scheme p.7|) 



1 A ^ ^2,/; 1 ^ ^ 937/; 1 ^ 



2 /i/ia a +2 2^ dxidxjaxt 2i ^ """" dxidxjaxtdx,\ 

t,J — l ■' i,j,k—l i,j,k,l—l I 

where all the coefficients and the derivatives of the function ip are evaluated at x. We 
do not need the explicit form of A{x) for the proof. 

Summing (15.321) over the first N terms, dividing by A^A, using (|4.2I) and rear- 
ranging the terms, we obtain 

1 A 

E<Pn-4'= ^(EVat - V'o) - ^ 51 EA(X„)+0(A2) (5.33) 

n=0 

71=0 ^ 

Due to Theorem 15. 11 we have 

where the constant A is the stationary average of A (see (|4.ip '). The expansion (|5.3ip 
with p=^l, q^Q and Co = -A follows from and ((OI)) . □ 

6. Error analysis for the numerical stationary measures. 

6.1. Distances between true and approximate stationary measures. We 

now use the results of the previous section to prove that any stationary measure of the 
numerical method is close to that of the underlining SDE. The existence of stationary 
measures for the numerical method follows by the Krylov-Bogoliubov constructionLj 
We have assumed in (j3.4l) and (j3.6l) or Assumption [3] that the finite time dynamics 
of our method and SDE are close. This can be seen as a form of "consistency" . It is 



^The fact that our state space is compact ensures that the time-averaged transition measure 
forms a tight family of probabihty measures I17| . 
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reasonable to expect that the longtime behavior will be close since our setting of the 
torus provides the necessary "stability" through ergodicity. 

We begin by giving a metric in which we will measure the distance between 
measures. If fj.'^ is a stationary measure of the numerical method, then for any 
bounded function 6 : T"^ — ^ R and n G N : 



E,0(X„)Ai^(dz) = / (6.1) 

where denotes the expectation conditional on X(0) = z. Fixing an integer p > 1, 
we define the metric p between two probability measures on T*^ by 



p{vi,V2) = sup yj (j){z)iyi{dz) ~ J (p{z):^2{dz) j , 

where H = {(j) : T'' ^ R : < 1} and H = M^^p.oo ^^le elliptic setting 

and H = ]4/2(p+i),oo -^^ ^j^g hypoelliptic setting. Observe that since (p € H implies 
— G "H, one also has the equivalent, and sightly more standard, characterization of 
p given by 



(f>{z)iyi{dz) - / (j){z)v2{dz) 



p{vi, V2) = sup 

Theorem 6.1. Defining the metric p as above for some integer p > 1 and assume 
that either: 

i) p — 1 and Assumptions[l\ and\^hold; or 
a) A ssumptions [7] anrf hold. 

then there exists a positive C so that if /i'^ is any stationary measure of the 
numerical method (|3.ip then 

p{p^,p)<CAP. 

Proof. Since p,^ is stationary, we have that J (j){z)p^{dz) — J E,z4>{Xn)p^{dz) 
for any n > and hence 



J <P{z)p^{dz) = / ^ E ^z^iX„)t^^idz) 



Then 



J (l){z)p^{dz) ~ J <P{z)p{dz) = y E E.0(^«) - 0] l^^idz 



ri=0 
N-1 



where in the last estimate we have invoked either Theorem 15.11 or Theorem 15.61 de- 
pending on which assumptions hold. Now taking iV — >■ cxd, proves the result since the 
right-hand side is uniform for any </> g "H. Q 

We reemphasize that we have not assumed in this section that the numerical 
method is uniquely ergodic. Rather we have shown that any stationary measure of 
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the numerical system is close to the true stationary measure in the p— distance. It 
is, of course, possible in many settings to show that the numerical method is itself 
uniquely ergodic (cf. [151 177] ) . 

Remark 6.2. It follows from Theorem 1 5. 8\ that under its assumptions the error 
pifJ''^ , /i) can be expanded in powers of the time step: 

Piti^,ti) = CflAP + ■■■ + CgAP+'i + O (AP+^+i) . 

Remark 6.3. It is also worth contrasting the result of Theorem \6.1\ with a short 
alternative proof. Let Pt denote the Markov semigroup associated with the SDE and 

with n steps of a numerical method with step size A. Suppose one knows for 
some metric d on probability measures that d(Pt/ii, Pt/12) < Ke^'^*d{fii, ^2) holds for 
all probability measures pi and that for any fixed n, d{P^ p'^ , PnAfJ-^) < CnA^ for 
some constant C„, all A sufficiently small and any measure p^ invariant for P^ . 
Then if one fixes an n so that Ke^^"^ < 1/2 then 

d{p,p^) = d{PMPnP^) < d{PMPnAP^)+diPnAP^,PnP^) 

< ^d(p,fi^) + C\AP 

and collecting the d{fi,ii^) produces the estimate d{ii,p,'^) < 2C„A'' for all A suf- 
ficiently small. The challenge in implementing such a seemingly simple program is 
obtaining the two estimates in the same metric d. Typically, the first estimate is 
available in the total variation distance. On the other hand, the second estimate is 
usually estimated in distances requiring test functions with a number of derivatives. 
The recent works \1 5\ \1 &I allow one to obtain the first estimate in the 1-Wasserstein 
distance which simplifies matching the two norms. Using this strategy on can obtain 
a bound in a stronger norm (such as total variation or 1-Wasserstein metric), but 
the d{P^p,PAnP) convergence rate will often be sub-optimal in these metrics. On 
the other hand, one can obtain d{Ptiii, Ptfi2) < Ke~'^*d{fii, fj,2) with d defined as in 
the metric p above with H — pl/2(p+i),oo ^gij^g that fact that \\Pi(j)\\H < C'||(/)||oo 
from some C and hence ||Pt_|_i0 — 4>\\h < C\\Pt<j} — 0||oo < CKe~'^^\\(l) — 4>\\oo < 
CA'e~'''*||(/) — (jiWu- This gives a result comparable to the hypoelliptic result in Theo- 
rem \6.1\ though misses the smoothing in the elliptic result which is embodied in the fact 
one can use H — W'^P'°° . (Some partial smoothing could have been extracted in the 
hypoelliptic case in Theorem \6.1\ with more care). See 115 j for this program executed 
in a particular setting. 

6.2. Relationship to Stein's method. This section is devoted to outlining 
the similarity between the current setting and Stein's method^ This connection is 
not fully explored in this work but we believe that it is insightful to highlight the main 
idea. Though our goals are different, there are some passing similarities between the 
details in this paper and f9', '4] . 

Let us recall that Stein's method is a generic tool for finding bounds on a dis- 
tance between two distributions which then can give quantitative convergence results. 
Denoting the target distribution as tt, the idea is to find an operator A and a deter- 
mining class of functions Q so that if, for all 5 € ^ one has ^ Ag{x)dnT{x) = 0, then 



*When this work was nearing completion, a conversation with between JCM and Sourav Chat- 
terjee prompted the authors to write this section reflecting on the relation between their approach 
and Stein's method. 
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this implies that tt = TrOGiven any h sufficiently nice, one next solves Stein's equation 

Ag ^ h (6.2) 

with the tacit assumption that the solution g exists and lies in G- Observe that a basic 
solvability condition on the above equation requires that J hdir — when A*Tr = 0. In 
this setting we can consider the equation Ag = h — J hdir for more general, uncentered 
h. In order to quantify the distance of a given measure tt from tt, one tries to control 
/ {Ag){z)dTT(z) by some norm of g which can in turn be controlled by an appropriate 
norm of h. For definiteness let us assume that 



^ iAg){z)dn{z)<e\g\g<eC\\h\\ 

if g satisfies (|6.2I) . Then 

hdn- I hdTr= I {Ag)iz)dn{z) < eC\\h\\ 



By taking the supremum over all /i in a given class with < 1, one obtains control 
over the distance of tt from tt. In particular, if e is small then tt is close to tt. 

This is essentially the methodology we have followed. Taking A to be the genera- 
tor £ of the Markov process, we are ensured that A*tt = if tt is the Markov process' 
stationary measurel^ The basic idea of this note is to show that if £ is a generator 
of another Markov process so that £ ~ £ is small then any stationary measure of the 
second Markov process will be close to tt. We will further assume that tt is ergodic. 

While our paper concerns a mixture of continuous and discrete time, the idea can 
be more easily demonstrated in a continuous time setting. Let Vt and Vt be strong 
Markov semigroups on R*^ with generators £ and £ both defined on some common 
domain D. Let tt and tt be stationary measures for Vt and Vt, respectively. Assume 
that for some set of bounded functions G C D there exists an e > so that 

J{£-£)gdn<e\\g\\g (6.3) 

for any g (z G- Further assume that for some class of bounded, real valued functions 
7i on R'* one can solve £g = h — J hdTT for /i e "H with a solution g d G and 
||.9lle < -f^ll'ill'H for a fixed constant K. Then for all h £H, 

j hdn- j hdTT = j {£g)dn = j {£g)dn + J - ^)9'^ < <^K\\h\\H , 

where in moving from the penultimate expression to the last, we have used £*tt = 0. 
Since the right-hand side is uniform in h we can take the supremum over h. If T-L 
was a rich-enough class to define a metric, we obtain some estimate of the distance 
between the two stationary measures in that metric. 



''As a referee correctly observed, the Echeverria- Weiss theorem is useful in identifying when such 
a condition characterizes an invariant measure and identifying the limiting invariant measure for a 
sequence measures with J Ag{x)dTT^(x) — >■ as n — ^ oo. However, here we are really interested in 
the next order question. How to use the degree to which the characterizing equation is not satisfied 
to obtain quantitative estimates of the convergence rate. 
^"This is done in some versions of Stein's method. See [4]. 
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If one does not have good control over tt then (I6.3P can be difficult to obtain. 
Instead it is often easier to replace (j6.3|) with 



limsup i / Pt{C ~ C)g{x)dt < e\\g\\g (6.4) 

T^oo Jo 

for any g (z G and Tr-a.e. x. As before, we assume that for some class of functions 
H from R'' — )■ R, one can solve Cg — h ~ J hdir for h & T-L with a solution g G G 
such that \\g\\g < K\\h\\-H. Observe that if is a class of bounded functions then the 
second assumption is always satisfied. 

Continuing since PtCg{x) — Pth(x) — J hdn, 

PTg{x)-g{x)^ [ PtCg{x)dt 

JQ 

= Pt{L - C)g{x)dt + I Pth{x)dt-T I hdn . 

JQ Jo JR<i 

Rearranging, dividing by T, and using \PTg{x)\ < \g\oo, produces 

f hdn-^ r ^h{x)dt<^M^ + ^ r P,{C~C)g{x)dt . 

By Birkoff 's ergodic theorem, we know that for 7r-almost every x 

lim — f Pth{x)dt = ( hd-K , 

so from (|6.4I) and \\g\\g < K\\h\\-u we obtain 

hdn — / hdn < eK 1 1 /i 1 1 -h 

for any h eH. Hence in the language of Section IH one has p(7r,7r) < eK. 

This argument can be modified in many ways, the restriction that G and H are 
classes of bounded functions can be removed by assuming some control over Ptg{x) 
uniform in time. If that control can be maintained using a function which is integrable 
with respect to tt then the requirement that n be ergodic can be removed. Although 
we do not fully explore these issues here, we believe that the connections made in this 
subsection are useful. 



7. Variance of the Empirical Time Average. Theorems 15.11 and 15.31 are im- 
portant for implementing time-averaging in computational practice. There are three 
types of errors arising in computing stationary averages: (i) numerical integration 
error (estimated by KA in ((57| and ((5T0| and by KAp in ((5?23| and ((5?28| ): (ii) 
the error due to the distance from the stationary distribution (i.e., the error due to 
the finite time of integration T estimated by K/T in (|4Jl) . ([SJ]) and ([5?23| ): (in) the 
statistical error. The first two errors contribute to the bias of the estimator (f>N (see 
(|5.7|) and ()5.23p ). The error of numerical integration is controlled by the time step 
and the choice of a method. It can be estimated in practice using the Talay-Tubaro 
expansion (see Section [?751 and 03 [IH]) in the usual fashion. The statistical error is 
contained in the second term of (|5.10p and related to the variance of the estimator 
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0Ar (see the details below and also (j4.8p V Both the error due to the finite time of 
integration T and the statistical error are controlled by the choice of the integration 
time T (or what is the same, the choice of the number of steps N under fixed time 
step A). They correspond to properties of the continuous dynamics of the SDE (|2.ip 
and they are almost independent (assuming a sufficiently small A) of a method or 
time step A used. 

Let us consider the statistical error. Theorems 15 . 1 1 and 15 . 2 1 immediately imply that 
Var((?!)7v) = 0(A^ + 1/T). In order to get a more accurate estimate for the variance 
of 4)N, analogous to the one in the continuous case (|4.8|) . i.e., to have the statistical 
error of the estimator controlled by T only, we need to make use a mixing-type 
condition. 

The Markov process X{t) defined on the torus T'^ by the SDE (|2.1[) is uniformly 
ergodic, it has exponential mixing rates (see, e.g., [TTl [28l [5] ) , and there are some 
positive C and 7 such that for any t > and 6* > the inequality 

\E(l){X{t))(l,{X{t + 9)) - Ecb{X{t))Eq^{X{t + e))\ < Ce-^^ (7.1) 

holds. Further, as it has been already mentioned before, it is possible in many settings 
to show that the Markov chain X„ generated by a numerical method is also geomet- 
rically ergodic (cf. [43l SH [27l [20] ) . Then, due to the compactness of the phase space, 
the chain is uniformly ergodic and has an exponential mixing like in (|7.1|) [281 Chapter 
16]. We note that in the cited papers one of the conditions to ensure the ergodicity of 
Xn is the requirement that the numerical method uses random variables with densi- 
ties positive everywhere. We do not address here the question about mixing rate for 
the Markov chains Xn generated by more general numerical methods treated in this 
paper leaving it for further study but instead we assume that the following relaxed 
mixing condition is satisfied for Xn and a 4> € H and Z > : 

\E(t>{Xk)<P{Xk+i) - E0(Xfc)E<^(Xfc+O| < ^^-2 , (7.2) 

(tk+l — tk) 

where if > is a constant independent of A, fc, /. This condition is much weaker than 
(|7.ip but it is sufficient for the proof of the following proposition. At the same time, 
a faster decorrelation than (|7.2p does not improve the estimate (|7.3p . 

Proposition 7.1. In the setting of Theorem \5.1\ and under the condition {7.0^ , 
one has 

Var(0^) < ^, (7.3) 

where T = iVA and K > is a constant independent of A and T . 
Proof. It follows from ()5.5p that 

Var(0jv) < K — (7.4) 

N-lN-l 



7V2 

i=0 j=0 



2 3 

i=0 i=0 
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We have used here that Mi^jy are martingales. Using the estimates for E|Mj jvP from 
TheoremO we get that ELo^IMj.jvP < KT. 

To estimate the terms with Si^N and {C^ — in (17.41) . we expfoit the condition 
(|7.2p . For example, consider the term with Si^n- We have 

A 4 

E\Si,N - ESi,n\^ = — e( {D^MFn,F,,] - Ei?Vn[i^»,F«] 

A4 N-1 

= — ^ e( {D^^/m[F„,F^] - EZ?2^„[F„,F„ 

.4 N-1 N-1 

+ — E E e( - Ei?Vz[i^»,F»]) 

i=0 j=i+l 

X (l?2^,[F„F,]-Ei?V,[^;-,^;] 

JV-l JV-l 

< KA^T + KA^ Y 2 ^ 

1=0 j=i+l (^i ~ 



Analyzing analogously the other terms in (17.41) . we arrive at the stated result. □ 

Although we proved Proposition 17.11 for the method p.ip . it is also valid for a 
more general class of numerical methods from Section 15.21 

In practice one usually estimates the statistical error oi 4>n as follow^. We run a 
long trajectory MT split into M blocks of a large length T — AN each. We evaluate 
the estimators m4'N, to = 1, ... , M, for each block. Since T is big and a time decay 
of correlations is usually fast, mft'N can be considered as almost uncorrelated. We 
compute the sampled variance 

^ - jT^ E (™'^^)' - (m E -^^n) ■ 

m=l m=l 

For a sufficiently large T and M,Ei(f>N belongs to the confidence interval 



with probability, for example 0.95 for c = 2 and 0.997 for c = 3. Note that E (j)^ con- 
tains the two errors forming the bias as explained at the beginning of this section. We 
also pay attention to the fact that D ^ 1/T (cf. (17. 3p ). i.e., it is inverse proportional 
to the product AiV. 

Remark 7.2. Instead of time averaging considered in this paper, stationary 
averages can be computed using ensemble averaging, i.e., by the following estimate for 
the stationary average (j): 

1 ^ 

« mm) « mm) ^v^j^Y. H^^'^'i*)) > 

1=1 



^^Of course, better statistical estimators can be used to quantify the statistical error of the time 
averaging but we restrict ourselves here to a simple one. 
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where t is a sufficiently large time, X is an approximation of X, and L is the number 
of independent approximate realizations. The total error = — consists of 
three parts: the error e of the approximation cf) by 'Ei(j){X (t)) ; the error of numerical 
integration CA^ (here p is the weak order of the method); and the Monte Carlo error 
which is proportional to 1/ \fL. More specifically 

|Bias(0)| = |E0- 0| < KISP ^'£ , Var(0) = 0(l/£) . 

Here, in comparison with the time-averaging approach, each error is controlled by its 
own parameter: sufficiently large t ensures smallness of e; time step A (as well as the 
choice of a numerical method) controls the numerical integration error; the statistical 
error is regulated by choosing an appropriate number of independent trajectories L. 
For further details see ISO]. 
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